About Dataset

OSHA Severe Injury Reports:

Data Dictionay:

Index Data Element Definition
1 ID Unique number for each record
2 UPA UPA code
3 EventDate The date of the event
4 Employer The name of the employer's organization
5 Address1 Address number 1 of the institution
6 Address2 Address number 2 of the institution
7 City Institute city
8 State Institution state
9 Zip Institute zip code
10 Latitude Institute Latitude
11 Longitude Institute Longitude
12 Primary NAICS Primary The North American Industry Classification System (NAICS) code which classifies an establishment’s business
13 Hospitalized 0 means not hospitalized and 1 to 6 are different levels of hospitalization
14 Amputation 0 means no amputation, and 1 to 2 are different levels of amputation
15 Inspection Inspection code
16 Final Narrative The final narrative of the accident
17 Nature Nature of injury code
18 NatureTitle Nature of injury title
19 Part of Body Part of body code
20 Part of Body Title Part of body title
21 Event Event code
22 EventTitle Event title
23 Source Source of accident code
24 SourceTitle Source of accident title
25 Secondary Source Secondary Source of accident code
26 Secondary Source Title Secondary Source of accident title

Link to access the dataset: https://www.osha.gov/severeinjury

1. Data Collection

1.1 Import Libraries For Overview & EDA

1.2 Get Of Dataset

2. Data Mining

2.1 Data Understanding

Overview Of The Dataset

2.2 Data Cleaning

Clean Data, Data Transform, Feature Selection & Feature Engineering

2.2.1 Columns

2.2.1.1 ID

2.2.1.2 UPA

2.2.1.3 EventDate

2.2.1.4 Employer

2.2.1.5 Address1

2.2.1.6 Address2

2.2.1.7 City

2.2.1.8 State

2.2.1.9 Zip

2.2.1.10 Latitude

2.2.1.11 Longitude

2.2.1.12 Primary NAICS

2.2.1.13 Hospitalized

2.2.1.14 Amputation

2.2.1.15 Inspection

2.2.1.16 Final Narrative

2.2.1.17 Nature

2.2.1.18 NatureTitle

2.2.1.19 Part of Body

2.2.1.20 Part of Body Title

2.2.1.21 Event

2.2.1.22 EventTitle

2.2.1.23 Source

2.2.1.24 SourceTitle

2.2.1.25 Secondary Source

2.2.1.26 Secondary Source Title

2.2.1.27 Event DayOfWeek

2.2.1.28 Event Day

2.2.1.29 Event Month

2.2.1.30 Event Year

2.2.2 Drop Non-Important Columns

2.2.3 Drop Missing Data

2.3 EDA: Exploratory Data Analysis

2.3.1 Auto EDA with Sweetviz

2.3.1.1 Severe Injuries: Hospitalization

2.3.1.2 Severe Injuries: Amputation

2.3.1.3 Compare Severe Injuries: Hospitalization

2.3.1.4 Compare Severe Injuries: Amputation

2.3.2 EDA With Python Libraries

3. Build Dataset For Predicting

3.1 Assign Unique Numbers To Each Value In The 'City', 'State' and 'Employer' Columns

3.2 Removing Categorical Columns For Predicting

End of Part 1